Bayesian posterior density estimation reveals degeneracy in three-dimensional multiple emitter localization

Single-molecule localization microscopy requires sparse activation of emitters to circumvent the diffraction limit. In densely labeled or thick samples, overlap of emitter images is inevitable. Single-molecule localization of these samples results in a biased parameter estimate with a wrong model of the number of emitters. On the other hand, multiple emitter fitting suffers from point spread function degeneracy, which increases model and parameter uncertainty. To better estimate the model, parameters and uncertainties, a three-dimensional Bayesian multiple emitter fitting algorithm was constructed using Reversible Jump Markov Chain Monte Carlo. It reconstructs the posterior density of both the model and the parameters, namely the three-dimensional position and photon intensity, of overlapping emitters. The ability of the algorithm to separate two emitters at varying distance was evaluated using an astigmatic point spread function. We found that for astigmatic imaging, the posterior distribution of the emitter positions is multimodal when emitters are within two times the in-focus standard deviation of the point spread function. This multimodality describes the ambiguity in position that astigmatism introduces in localization microscopy. Biplane imaging was also tested, proving capable of separating emitters up to 0.75 times the in-focus standard deviation of the point spread function while staying free of multimodality. The posteriors seen in astigmatic and biplane imaging demonstrate how the algorithm can identify point spread function degeneracy and evaluate imaging techniques for three-dimensional multiple-emitter fitting performance.

Histogram of all visited models across all frames.Emitter intensities were sampled from N (2000, 150), background was set to 20 photons.Emitters were placed in focal plane at the center of the region of interest (ROI) and given a sub-pixel shift drawn from U (−0.5, 0.5). 100 frames were used.Supplementary Figure S13.Probability of error between modes for the two emitter problem using astigmatic and biplane imaging.Plotted for varying emitter and background intensities.a) Probability of error when selecting modes with astigmatic imaging.b) Probability of error when selecting modes using biplane imaging.Emitters were placed in focal plane, separated by 5/4 σ PSF along the x-axis.All uniform priors except for emitter intensity, which used N (I sample , 3.5 I sample ).Probability of error was calculated using the same methods as for Supplementary Figure S12 described in the Supplementary Note on multimodality in two emitter astigmatic imaging.Reconstructions without alternate mode used a probability of error of 0, while reconstructions that could not separate emitters used a probability of error of 1.
move type probability RJMCMC burn RJMCMC MCMC Single Emitter  Finally, the prior for the intensity should be either determined experimentally by kernel density fitting SMLM results, or directly defined by the user given sufficient knowledge of the emission response.For the testing on synthetic data, often (see Fazel et al. 1 ) a Gaussian prior around the expected emitter intensity is used, with an additional flat shoulder at lower intensities to facilitate model jumps if necessary, such as: The prior on intensity is assumed to be unchanging over depth.

Moves
In RJMCMC, the design of the moves is crucial for exploring the model space.The moves used in 3D RJMCMC localization are: • A single emitter move, varying the 3D position and intensity of a single emitter; • A group move, varying the 3D position and intensity of a cluster of emitters; • A background move, changing the offset background estimate; • A split move, splitting one emitter into two; • A merge move, merging two emitters into one; • A generalized split move, splitting a cluster of N emitters into N + 1; • A generalized merge move, merging a cluster of N emitters into N − 1; • A birth move, spawning an emitter where the difference between measured and expected intensity is high; • A death move, removing an emitter at random.
The single emitter, group, and background move work in parameter space while leaving the model space identical.The rest of the moves come in pairs to retain detailed balance and change both the model and the parameters of the estimate.Supplementary Figure S1 shows schematically how each move changes the parameters or model.

Move proposal
At every iteration of the algorithm, a move is selected according to fixed probabilities defined by the user.Move probabilities can be set separately for the MCMC portion, as well as both the burn-in phase and the post burn-in phase of the RJMCMC portion.An appropriate setting of the move probabilities results in sufficient mixing between chains.As a result of chain mixing, the algorithm makes jumps between multiple solution modes, which allows exploration of the full posterior density.Specifically for our problem, this allows us to explore ambiguous localization solutions due to PSF degeneracy.
To facilitate sufficient mixing, the following procedure was used to determine the move probabilities in Supplementary Table S1.To choose the probabilities related to 2D moves, the parameters from Fazel et al. 1 were initially chosen.For the 3D probabilities, the starting point was based on our presumed depth of view.Given the increased complexity when moving from 2D RJMCMC to 3D RJMCMC, we identified that these parameters needed to be tuned.As such, these parameters were tuned until sufficient mixing between the chains was observed.For testing on synthetic data, the move probabilities in Supplementary Table S1 were found to work well.

Single emitter move
The single emitter, group, and background move moves will act on the parameter space only and their acceptance rates can be determined from the Metropolis-Hastings algorithm for the MCMC case.

16/28
Metropolis-Hastings acceptance rate for parameter space jumps The probability of accepting parameter space jumps is given as: with α the acceptance rate, using the joint probability P(θ , k, D) = P(D|θ , k)P(θ |k)P(k) to calculate the posterior ratio (ignoring evidence term P(D) as it gets cancelled in the division) and the final fraction as the determinant of the Jacobian of the function θ ′ (θ , u) that generates the new parameters.Here, θ = θ x,0 θ y,0 θ z,0 θ I,0 . . .θ x,i θ y,i θ z,i θ I,i θ b is the parameter vector, containing the positions and intensities of the i emitters used in the current model along the background θ b .Using a random walk sampler for the parameter space moves and cancelling the model prior, the acceptance rate reduces to: The single emitter move randomly selects one of the i current emitters and generates a new estimate out of its previous parameters as follows: θ ′ y, j = θ y, j + u y (10) with u drawn from the following normal distributions: and σ x , σ y , σ z , σ I some user determined hyperparameters to vary the jump size.The posterior ratio in Equation 8 can now be simplified to: Assuming the estimated parameters stay within the bounds of the priors, the uniform position priors cancel out and only the likelihood ratio and intensity prior ratio are left.

Group move
The group move first finds a cluster of nearby emitters, and then executes a single emitter move on each of them.It randomly picks one emitter, then searches for emitters within a 2σ PSF sphere around it to form a cluster.This radius can be treated as a hyperparameter and tuned, though for Gaussian and astigmatic PSFs it works well when set to the PSF width in focal plane.More complex 3D PSFs may require a fairly large radius for clustering.After finding an eligible cluster and executing the moves, the acceptance rate can be calculated using the same equations as in the single move, namely as in Equations 8 and 17.
If a cluster cannot be found, the move fails.

Background move
The background move is not coupled to any emitter.It updates a flat offset background intensity with a random walk sampler as follows: with σ b another hyperparameter for tuning the background jump size.The acceptance rate from Equation 17 can now be reduced even further, assuming the background jump stays within the bounds of its uniform prior: which is just the likelihood ratio of the parameters before and after the jump.

Split and Merge pair
The (generalized) split, merge, birth and death moves come in pairs and may act on both the parameter and the model space.
Their acceptance rates can be determined from the general acceptance rate used in RJMCMC.

Metropolis-Hastings acceptance rate for model space jumps
The probability of accepting model space jumps in the general case is given as: with r m (θ ) the probability of choosing the move type (selecting either split or merge) and q(u) the probability density function of the draws from u used to generate the split weights, means, and standard deviations.The split and merge move turn one emitter into two and vice versa.By using this move pair, the algorithm has a move that allows it to distinguish emitters that are in close proximity to one another.Split and merge will uphold the following constraints: θ I, j * θ y, j * = θ I, j 1 θ y, j 1 + θ I, j 2 θ y, j 2 (24) Split For a split, θ ′ is generated from a randomly selected emitter with index j * as follows: θ y, j * θ I, j * = θ y, j 1 θ I, j 1 + θ y, j 2 θ I, j 2 (38) with the two new emitters retaining the 3D center of mass and total intensity of the emitter they split from.Indices j 1 and j 2 are used to indicate the emitters resulting from a split, or those used in a merge, whereas index j * is used for the result of a merge or the target of a split.

18/28
The ratio of selecting the move types then becomes: and q(u) is: q(u) = P(u 1 )P(u 2 )P(u 3 )P(u 4 ) Knowing the split move only influences these parameters and leaves the other emitters and background untouched, the emitter to be split and the resulting emitters after the split can be moved to the back for both θ ′ and (θ , u) vectors.The resulting vectors then become: The Jacobian can then be constructed as: with identity matrix I k−1 of dimension k − 1 for all the unchanged parameters and 0 k−1,1 , 0 1,k−1 column and row vectors containing zeros.This yields the determinant: Setting P split = P merge , the final result is then: Merge For a merge move, an emitter is selected at random, after which it is randomly paired with another emitter within 2 σ PSF of itself.If no emitter can be found within this distance, the move fails.The emitters combine their intensity and center of mass.The acceptance rate can be set to the inverse of that for the split move, finding u 1 , u 2 , u 3 and u 4 deterministically from Equations 31, 33, 34, and 35: This gives the the acceptance rate: with θ previously being the parameters before the split and now thus the parameters after the merge and vice versa for θ ′ .

Single emitter localization convergence for an astigmatic PSF
To verify if the algorithm works correctly, it is first tested on frames where only a single emitter is active.Synthetic data is generated for an emitter in focal plane of a 20 by 20 pixel ROI, assuming an effective pixel size of 100 nm.Simulated emitters are placed in the center of the ROI, generating 50 frames, each with a random subpixel shift in x and y drawn from a uniform distribution and intensities drawn from a normal distribution: θ y ∼ U (10 − .5, 10 + .5)(84) with the position in pixels, and I sim the mean intensity for the synthetic dataset.The background photon count is fixed at 20 photons.The algorithm is tested on data with I sim = 2000 and I sim = 500 to demonstrate convergence at varying SBR.
The hyperparameters used for localization are as follows.The RJMCMC portion ran for 20000 burn-in iterations, followed by another 10000 iterations from which the model is determined.MCMC is then run for another 5000 iterations, from which the final positions of the emitters are determined.The parameters for the random walk samplers within the RJMCMC and MCMC portions are detailed in Supplementary Table S2, while the move probabilities were taken from Supplementary Table S1.The depth was constrained to a range of −1, 1 µm.
The priors for position and background are uniform, with a custom prior used for intensity.The number of emitters was constrained to k max = 6.The algorithm was initialized randomly with k max active emitters drawn from the priors, ensuring the initial model is incorrect and allowing for meaningful conclusions from the model time series and autocorrelation.

High SBR results
For I sim = 2000, the prior on intensity is given as follows: The prior is plotted in Supplementary Figure S2.
Model convergence To verify the model has converged, the time series of the models over all frames as well as the average autocorrelation of the model is studied.In Supplementary Figure S3, the model time series and autocorrelation are plotted.
From the time series, we can see that from the initialization at k max = 6 emitters, the estimated model rapidly reduces to a two or one emitter model.There is notably little mixing in the model chain.This may be due to the simplicity of localizing a single emitter, the algorithm therefore having infinitesimal probabilities of jumping to models with 3 emitters or more.Note that after the burn-in fraction, the split and merge move are turned off and there is no more change in the model found.With split and merge disabled, and G-split and G-merge failing since no clusters can be made out of one emitter, the only moves capable of changing the model are birth and death.At this point, the RJMCMC seems to have converged strongly.Evaluating the average model chain autocorrelation, it is clear that after 2000 RJMCMC iterations, the autocorrelation has gone to zero.This gives an idea of the chain length required to estimate the model distribution under these conditions.Some multiple of 2000, say 6000 RJMCMC iterations, may already be sufficient for model convergence on this problem.From the histogram in Supplementary Figure S3 d) it is clear that on average, the estimates spend nearly all their time in a model with one active emitter.

23/28
Parameter convergence Again, we look to the autocorrelation of the parameters to assess convergence.In Supplementary Figure S3 b), the average autocorrelations of all parameters for their MCMC chains are plotted.Note that the MCMC chain was initalized from the last RJMCMC iteration that had the correct model, therefore the parameters can be expected to converge quickly and there is no need to discard the initial part of the chain.The autocorrelations of the lateral position and background decrease almost immediately to zero.Note that while the autocorrelation for axial position drops to zero in the same time, there are some slower transients visible in the plot.The same holds for the autocorrelation of intensity, which also takes significantly longer to reach zero.The acceptance rates for the moves are 0, 0.47 ± 0.02, 0.28 ± 0.01 for the group move, single emitter move, and background move, respectively.The group move cannot find a cluster and is therefore never executed, meanwhile the single emitter move has a good acceptance rate ensuring proper mixing.The background move has low acceptance, though it should still be enough for proper mixing.
Calculating the lateral and axial RMSE results in 0.05 pixels and 120 nm errors, respectively, giving us an accurate estimate of the emitter position.
It can be concluded that the initial position found from RJMCMC was sufficiently close to the optimum and the chosen hyperparameters and chain length were appropriate, though the sampling of intensity still shows correlation and σ I may need to be tuned more.

Low SBR results
For I sim = 500, the prior on intensity is given as follows: The prior is plotted in Supplementary Figure S4.This dataset samples intensities from N (500, 150) photons, with a background of 20 photons.

Model convergence
In Supplementary Figure S5 a) and c), the model time series and autocorrelation are plotted.Again, from the incorrect initalization the estimated model rapidly reduces to a two or one emitter model.There is less mixing and faster convergence compared to the simulations with I sim = 2000.Looking at the average autocorrelation, it reaches zero with a lag of just 250.Again, this indicates that the 30000 RJMCMC iterations are more than sufficient for determining the model.The histogram in Supplementary Figure S5 d) again shows that the chain spends virtually all its time in a single emitter model.
Parameter convergence In Supplementary Figure S5 b), the average parameter autocorrelation for the 50 frames is shown.The autocorrelations of the position chains take longer to reach zero compared to the high SBR case.This time, the axial chain is actually slower than the intensity chain, and now even the lateral chain displays some slow transient behavior.The background converges first, followed by intensity and lateral position, and finally axial position.Under these conditions, convergence is limited by the sampling of the z-position instead of the intensity.The acceptance rates for the moves are 0, 0.73 ± 0.04, 0.270 ± 0.009 .The single emitter move has a high acceptance rate and the corresponding random walk hyperparameters may be increased for better mixing and less correlation between samples.
Calculating the lateral and axial RMSE results in 2 pixels and 360 nm errors, respectively, which is a lot less accurate especially in lateral direction compared to the high SBR case.This is due to some of the outliers in the sampled intensity, for instance frames 18 and 45 having an intensity less than 100 photons (79 and 83, respectively).Not considering these two frames, intensities still range from 200 to 700 photons (covering about 85% of the interval), and the lateral and axial RMSE become 0.14 pixels and 340 nm each.
This test has shown that though the chain length is sufficient, tuning of the single emitter move may lead to faster convergence.
Finally, it can be concluded that the hyperparameters provided here generally result in convergence, though it is best practice to adjust them to the imaging conditions of your system, ideally making use of autocorrelation plots to identify sub-optimal behaviour.

Single emitter precision and accuracy for a tetrapod PSF
The same testing was done for a tetrapod PSF, using the same hyperparameters as in Supplementary Table S4 but with a PSF width of 2.1 pixels.At emitter intensities less than 1000 photons, some frames failed to converge, showcasing the difficulty of using a complex PSF at a low SBR.This may be due to the flat intensity prior yielding poor initializations and difficulty escaping local minima.The accuracy and precision are thus plotted over a range of 1000 to 3000 photons.Supplementary Figure S10 shows the CRLB and localization precision found as above.Note that the found precision for intensity breaks the CRLB, especially at higher intensity, despite the other parameter precisions tracking the CRLB well.The precision found when assuming a Gaussian distribution is thus not representative of the information present in the frame, at least not for the emitter intensity.Supplementary Figure S11 again shows the resampled RMSE for the same data.This time, all estimates behave as expected, broadly tracking the CRLB.

Multimodality in two emitter imaging
During testing of two emitter separability, under some conditions four peaks were found in the reconstruction of the emitter position.This occurred despite the chains in those situations converging to the correct MAP number of emitters.This phenomenon was observed at smaller distances between emitters, using the astigmatic PSF under relative angles of 0 and 90 degrees with respect to the x-axis, and for the tetrapod PSF under angles of 45 degrees.To understand if this was due to pseudoconvergence or if this was representative of the underlying posterior, a multimodal frame was analyzed, shown in Supplementary Figure S12.K-means clustering was used to find four potential localizations, shown together with the ground truth in Supplementary Figure S12 a).Then, by picking fixing one of these emitters and moving the other around the ROI at the same depth and intensity, the log likelihood was mapped, shown in Supplementary Figure S12 c).Fixing the rightmost emitter finds the global optimum to be a pair of emitters near the ground truth, while fixing the bottom emitter finds an alternate mode of the solution, a pair perpendicular to the ground truth.Note that this local optimum has virtually the same likelihood.To investigate if either mode could be representative of the data, a χ-squared test was done on a 95% confidence interval.The χ 2 value is calculated as follows: With N p the number of pixels, D i the measurement of the i th pixel, and µ i the expected value.As the frames are subject to Poisson noise, the χ 2 threshold for a 95% confidence interval can be found with: Applying it to this frame results in χ 2 threshold = 456, with the true and alternate modes returning a χ 2 of 407 and 408, respectively.Both modes thus are well within the 95% confidence interval and are representative of the frame.
To test whether it is possible to select the true mode from these hypotheses, the log likelihood ratios of the hypotheses are used to calculate the probability of error.This log likelihood ratio can be approximated by a Gaussian distribution: LLR q,r = ln P(D|H q ) P(D|H r ) (90) with LLR q,r the log likelihood ratio of hypothesis q over hypothesis r, µ q,r,s the mean of LLR q,r assuming hypothesis s is true, and σ q,r,s the corresponding width.For the two hypotheses H 0 and H 1 , P(LLR 0,1 |H 0 ) and P(LLR 0,1 |H 1 ) can now be calculated, as shown in Supplementary Figure S12 b).To retrieve the probability of error, we now use the probability of making a correct selection: P c = P(LLR 0,1 > 0|H 0 ) − P(LLR 0,1 < 0|H 1 ) (93) P e = (1 − P c )/2 (94) with P c the probability of being correct and P e the probability of error.Applying this to the two modes from Supplementary Figure S12 results in a probability of error of 0.498.It is therefore not possible to select the right mode solely given the data.
Testing the multimodal reconstructions found under different angles for the astigmatic and tetrapod PSF yielded similar results, where both pairs of emitter locations pass the chi-squared test and have greater than 49.5% probability of error.
In conclusion, while both modes are representative of the data, there is no distinction to be made based on their likelihood.This indicates that pseudoconvergence is not the case and the posterior distribution in this scenario is multimodal.As both PSFs show this multimodality under different conditions, this indicates PSF degeneracy.
. A collection of diagrams visualising the proposals that each move makes.a) The single emitter move changes the 3D position of an emitter, as well as its intensity.b) The group move finds a cluster of emitters and randomly and independently changes their positions and intensities.c) The background move changes the estimated background photon count.d) Split distributes the intensity of a random emitter over two new emitters.e) Merge randomly selects an emitter and searches within a given radius for an emitter to merge it with, combining intensities.f) Generalized split finds a cluster of N emitters and splits them off into N + 1 emitters.g) Generalized merge finds a cluster of N + 1 emitters to merge into N emitters.h) Birth constructs a residual image to use as a probability distribution for possible undetected emitter positions, generating a new emitter with a random position.i) Death removes a randomly selected emitter from the model.Prior on intensity.Used for synthetic data with intensities drawn from N (2000, 150).Model and parameter space convergence of the algorithm for high SBR frames.a) Autocorrelation of the model chain.b) Autocorrelation of the parameters.c) Time series plot of the model for all frames.d) Prior on intensity for low SBR.Used for synthetic data with intensities drawn from N (500, 150).
Model and parameter space convergence of the algorithm for low SBR frames.a) Autocorrelation of the model chain.b) Autocorrelation of the parameters.c) Time series plot of the model for all frames.d) Histogram of all visited models across all frames.Emitter intensities were sampled from N (500, 150), background was set to 20 photons.Emitters were placed in focal plane at the center of the ROI and given a sub-pixel shift drawn from U (−0.5, 0.5). 100 frames were used.Model and parameter space convergence of the algorithm for high SBR frames.a) Autocorrelation of the model chain.b) Autocorrelation of the parameters.c) Time series plot of the model for all frames.d) Histogram of all visited models across all frames.Emitter intensities were sampled from N (2000, 150), background was set to 20 photons.Emitters were placed in focal plane at the center of the ROI and given a sub-pixel shift drawn from U (−0.5, 0.5). 100 frames were used.Model and parameter space convergence of the algorithm for low SBR frames.a) Autocorrelation of the model chain.b) Autocorrelation of the parameters.c) Time series plot of the model for all frames.d)Histogram of all visited models across all frames.Emitter intensities were sampled from N (500, 150), background was set to 20 photons.Emitters were placed in focal plane at the center of the ROI and given a sub-pixel shift drawn from U (−0.5, 0.5). 100 frames were used.Violinplots of parameter estimates over intensity using astigmatic imaging, compared to the CRLB.a) Lateral precision violinplot, using an effective pixel size of 100 nm.b) Axial precision violinplot.c) Violinplot of the emitter intensity precision.d) Violinplot of the background intensity precision.Precisions were calculated from the MCMC chain of individual frames, assuming a Gaussian distribution.With a background of 20 photons, 50 frames were used per datapoint from 500 to 3000 emitter intensity photons.Emitters were placed in focal plane at the center of the ROI and given a sub-pixel shift drawn from U (−0.5, 0.5).Bootstrap resampled root mean square error (RMSE) results for the parameter estimates plotted over intensity using astigmatic imaging.a) Violinplot of the lateral RMSE.b) Violinplot of the axial RMSE.c) Violinplot of the emitter intensity RMSE.d) Violinplot of the background intensity RMSE.Results were resampled using 50% of the available data and 1000 runs.With a background of 20 photons, 50 frames were used per datapoint from 500 to 3000 emitter intensity photons.Emitters were placed in focal plane at the center of the ROI and given a sub-pixel shift drawn from U (−0.5, 0.5).Violinplots of the parameter estimates over intensity using a tetrapod PSF, compared to the CRLB.a) Lateral precision violinplot, using an effective pixel size of 100 nm.b) Axial precision violinplot.c) Violinplot of the emitter intensity precision.d) Violinplot of the background intensity precision.Precisions were calculated from the MCMC chain of individual frames, assuming a Gaussian distribution.With a background of 20 photons, 50 frames were used per datapoint from 500 to 3000 emitter intensity photons.Emitters were placed in focal plane at the center of the ROI and given a sub-pixel shift drawn from U (−0.5, 0.5).Supplementary FigureS11.Bootstrap resampled RMSE results for the parameter estimates using a tetrapod PSF, plotted over intensity.a) Violinplot of the lateral RMSE.b) Violinplot of the axial RMSE.c) Violinplot of the emitter intensity RMSE.d) Violinplot of the background intensity RMSE.Results were resampled using 50% of the available data and 1000 runs.With a background of 20 photons, 50 frames were used per datapoint from 500 to 3000 emitter intensity photons.Emitters were placed in focal plane at the center of the ROI and given a sub-pixel shift drawn from U (−0.5, 0.5).Analysis of multimodality in astigmatic PSF 3D reconstructions for the two emitter problem.Ground truth positions are marked with blue dots, while localizations are marked with green crosses.a) Single frame reconstruction for the two emitter problem.b) Plots of the log likelihood ratios between the hypothesised modes of the reconstruction in a), using a Gaussian approximation.c) Left: fixing the rightmost localization, an emitter is shifted over the ROI, imaging the log likelihood and marking the maximum (red star).Right: the bottom emitter is fixed, and a local optimum is found in the black cross.This finds the pair of modes used in b), one on the ground truth and one perpendicular to it.Calculating the probability of error yields a 49.8% chance of selecting the wrong hypothesis.The expected value of either mode passes a chi-squared test on a 95% confidence interval.